Speech/laughter classification in meeting audio

نویسندگان

  • Swe Zin Kalayar Khine
  • Tin Lay Nwe
  • Haizhou Li
چکیده

In this paper, harmonicity information is incorporated into acoustic features to detect laughter segments and speech segments. We implement our system using HMM (Hidden Markov Models) classifier trained on Pitch and Harmonic Frequency Scale based subband filters (PHFS). Harmonicity of the signal can be determined by variation of the pitch and harmonics. The cascaded subband filters are used to spread in pitch and harmonicity frequency scale to describe the harmonicity information. The pitch bandwidth of the first layer spans from 80 Hz to 300 Hz and the entire band spans 80 Hz ~ 8 kHz. The experiments are conducted on ICSI meeting corpus (BMR and Bed). We achieve an average error rate of 0.84% for ‘BMR’ meeting and 3.64% for ‘BED’ meeting in segment level speech and laughter detection. The results show that the proposed Pitch and Harmonic Frequency Scale (PHFS) based feature is robust and effective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Laughter Detection in Noisy Settings

Spontaneous human speech contains a lot of sounds that are not proper speech, yet carry meaning, laughter being a good example. Recognizing such sounds from speech-sounds could improve speech recognition systems as well as widen the communicative range of automatic dialogue systems. Our goal is to develop methods for automatic classification non-speech vocal sounds. As laughter varies widely be...

متن کامل

Fusion for Audio-Visual Laughter Detection

Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. ...

متن کامل

Decision-Level Fusion for Audio-Visual Laughter Detection

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audiovisual laughter detection is performed by fusing the results of separate audio and video classifiers on the decision level. This r...

متن کامل

Combining acoustic and visual features to detect laughter in adults' speech

Laughter can not only convey the affective state of the speaker but also be perceived differently based on the context in which it is used. In this paper, we focus on detecting laughter in adults’ speech using the MAHNOB laughter database. The paper explores the use of novel long-term acoustic features to capture the periodic nature of laughter and the use of computer vision-based smile feature...

متن کامل

Laughter and filler detection in naturalistic audio

Laughter and fillers are common phenomenon in speech, and play an important role in communication. In this study, we present Deep Neural Network (DNN) and Convolutional Neural Network (CNN) based systems to classify non-verbal cues (laughter and fillers) from verbal speech in naturalistic audio. We propose improvements over a deep learning system proposed in [1]. Particularly, we propose a simp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008